3 research outputs found
Automatic Individual Identification of Patterned Solitary Species Based on Unlabeled Video Data
The manual processing and analysis of videos from camera traps is
time-consuming and includes several steps, ranging from the filtering of
falsely triggered footage to identifying and re-identifying individuals. In
this study, we developed a pipeline to automatically analyze videos from camera
traps to identify individuals without requiring manual interaction. This
pipeline applies to animal species with uniquely identifiable fur patterns and
solitary behavior, such as leopards (Panthera pardus). We assumed that the same
individual was seen throughout one triggered video sequence. With this
assumption, multiple images could be assigned to an individual for the initial
database filling without pre-labeling. The pipeline was based on
well-established components from computer vision and deep learning,
particularly convolutional neural networks (CNNs) and scale-invariant feature
transform (SIFT) features. We augmented this basis by implementing additional
components to substitute otherwise required human interactions. Based on the
similarity between frames from the video material, clusters were formed that
represented individuals bypassing the open set problem of the unknown total
population. The pipeline was tested on a dataset of leopard videos collected by
the Pan African Programme: The Cultured Chimpanzee (PanAf) and achieved a
success rate of over 83% for correct matches between previously unknown
individuals. The proposed pipeline can become a valuable tool for future
conservation projects based on camera trap data, reducing the work of manual
analysis for individual identification, when labeled data is unavailable
Semi-Supervised Learning Approach for Fine Grained Human Hand Action Recognition in Industrial Assembly
Until now, it has been impossible to imagine industrial manual assembly without humans due to their flexibility
and adaptability. But the assembly process does not always benefit from human intervention. The error-proneness
of the assembler due to disturbance, distraction or inattention requires intelligent support of the employee and
is ideally suited for deep learning approaches because of the permanently occurring and repetitive data patterns.
However, there is the problem that the labels of the data are not always sufficiently available. In this work, a
spatio-temporal transformer model approach is used to address the circumstances of few labels in an industrial
setting. A pseudo-labeling method from the field of semi-supervised transfer learning is applied for model training,
and the entire architecture is adapted to the fine-grained recognition of human hand actions in assembly. This
implementation significantly improves the generalization of the model during the training process over different
variations of strong and weak classes from the ground truth and proves that it is possible to work with deep
learning technologies in an industrial setting, even with few labels. In addition to the main goal of improving
the generalization capabilities of the model by using less data during training and exploring different variations
of appropriate ground truth and new classes, the recognition capabilities of the model are improved by adding
convolution to the temporal embedding layer, which increases the test accuracy by over 5% compared to a similar
predecessor model
Automatic Individual Identification of Patterned Solitary Species Based on Unlabeled Video Data
The manual processing and analysis of videos from camera traps is time-consuming and includes several steps,
ranging from the filtering of falsely triggered footage to identifying and re-identifying individuals. In this study,
we developed a pipeline to automatically analyze videos from camera traps to identify individuals without
requiring manual interaction. This pipeline applies to animal species with uniquely identifiable fur patterns and
solitary behavior, such as leopards (Panthera pardus). We assumed that the same individual was seen throughout
one triggered video sequence. With this assumption, multiple images could be assigned to an individual for the
initial database filling without pre-labeling. The pipeline was based on well-established components from
computer vision and deep learning, particularly convolutional neural networks (CNNs) and scale-invariant feature
transform (SIFT) features. We augmented this basis by implementing additional components to substitute
otherwise required human interactions. Based on the similarity between frames from the video material, clusters
were formed that represented individuals bypassing the open set problem of the unknown total population. The
pipeline was tested on a dataset of leopard videos collected by the Pan African Programme: The Cultured
Chimpanzee (PanAf) and achieved a success rate of over 83% for correct matches between previously unknown
individuals. The proposed pipeline can become a valuable tool for future conservation projects based on camera
trap data, reducing the work of manual analysis for individual identification, when labeled data is unavailable